A Study of multilabel text classification and the effect of label hierarchy

نویسندگان

  • Sushobhan Nayak
  • Raghav Ramesh
  • Suril Shah
چکیده

Text classification has traditionally been one of the most popular problems in information retreival, natural language processing and machine learning. In the simplest case, the task of text classification [1] is as follows: A set of training documents T = {X1, X2, ...Xm} , each labelled with a class value from a set of k distinct labels, from the set {1, 2, ..k}, is used to learn a classification model, that captures the relationship between features in the documents and their labels. Subsequently, for a test document with unknown label, the model is used to predict a label. The problem of text classification finds wide appeal in various domains for tasks such as (i) News selection and grouping, (ii) Document organization in digital libraries, websites, social feeds,etc., (iii) Email classification including spam filtering. A variation of this problem in which each document can belong to any number of classes (labels) is referred to as a multilabel text classification problem. An extension to such a problem in which the labels are interrelated by a categorical hierarchy is referred to as a hierarchical text classification problem. In this project, we • study the task of multilabel text classification on real world datasets

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploiting Associations between Class Labels in Multi-label Classification

Multi-label classification has many applications in the text categorization, biology and medical diagnosis, in which multiple class labels can be assigned to each training instance simultaneously. As it is often the case that there are relationships between the labels, extracting the existing relationships between the labels and taking advantage of them during the training or prediction phases ...

متن کامل

IN-DEDUCTIVE and DAG-Tree Approaches for Large-Scale Extreme Multi-label Hierarchical Text Classification

This paper presents a large-scale extreme multilabel hierarchical text classification method that employs a large-scale hierarchical inductive learning and deductive classification (IN-DEDUCTIVE) approach using different efficient classifiers, and a DAG-Tree that refines the given hierarchy by eliminating nodes and edges to generate a new hierarchy. We evaluate our method on the standard hierar...

متن کامل

Adaptive Large Margin Training for Multilabel Classification

Multilabel classification is a central problem in many areas of data analysis, including text and multimedia categorization, where individual data objects need to be assigned multiple labels. A key challenge in these tasks is to learn a classifier that can properly exploit label correlations without requiring exponential enumeration of label subsets during training or testing. We investigate no...

متن کامل

Effective and Efficient Multilabel Classification in Domains with Large Number of Labels

This paper contributes a novel algorithm for effective and computationally efficient multilabel classification in domains with large label sets L. The HOMER algorithm constructs a Hierarchy Of Multilabel classifiERs, each one dealing with a much smaller set of labels compared to L and a more balanced example distribution. This leads to improved predictive performance along with linear training ...

متن کامل

Multi-topic Text Categorization Based on Ranking Approach

This paper is devoted to the multi-topic (multilabel) text classification problem. We propose two methods for reduction from ranking to the multi-label case. Unlike existing multi-label classification methods based on reduction from ranking problem, where the complex classification (threshold) function is being defined on the input feature space, in our approach we propose the construction of s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013